Search CORE

33 research outputs found

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Author: Azghadi Mostafa Rahimi
Lammie Corey
Xiang Wei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated using both GPUs and FPGAs. Our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art performance, while offering a >16-fold improvement in power consumption, compared to conventional GPU-accelerated networks. Both our FPGA-accelerated determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10 by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

ResearchOnline at James Cook University

Digital, analog, and memristive implementation of Spike-based Synaptic Plasticity

Author: Lammie Corey
Rahimiazghadi Mostafa
Publication venue: Queensland Brain Institute
Publication date: 01/12/2017
Field of study

Synaptic platicity is believed to play an essential role in learning and memory in the brain. To date, many plasticity algorithms have been devised, some of which confirmed in electrophysiological experiments. Perhaps the most popular synaptic platicity rule, or learning algorithm, among neuromorphic engineers is the Spike Timing Dependent Plasticity (STDP). The conventional form of STDP has been implemented in various forms by many groups and using different hardware approaches. It has been used for applications such as pattern classification. Hoever, a newer form of STDP, which elicits synaptic efficacy modification based on the timing among a triplet of pre- and post-synaptic spikes, has not been well explored in hardware

ResearchOnline at James Cook University

Modeling and simulating in-memory memristive deep learning systems: an overview of current efforts

Author: Lammie Corey
Rahimiazghadi Mostafa
Xiang Wei
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Deep Learning (DL) systems have demonstrated unparalleled performance in many challenging engineering applications. As the complexity of these systems inevitably increase, they require increased processing capabilities and consume larger amounts of power, which are not readily available in resource-constrained processors, such as Internet of Things (IoT) edge devices. Memristive In-Memory Computing (IMC) systems for DL, entitled Memristive Deep Learning Systems (MDLSs), that perform the computation and storage of repetitive operations in the same physical location using emerging memory devices, can be used to augment the performance of traditional DL architectures; massively reducing their power consumption and latency. However, memristive devices, such as Resistive Random-Access Memory (RRAM) and Phase-Change Memory (PCM), are difficult and cost-prohibitive to fabricate in small quantities, and are prone to various device non-idealities that must be accounted for. Consequently, the popularity of simulation frameworks, used to simulate MDLS prior to circuit-level realization, is burgeoning. In this paper, we provide a survey of existing simulation frameworks and related tools used to model large-scale MDLS. Moreover, we perform direct performance comparisons of modernized open-source simulation frameworks, and provide insights into future modeling and simulation strategies and approaches. We hope that this treatise is beneficial to the large computers and electrical engineering community, and can help readers better understand available tools and techniques for MDLS development

ResearchOnline at James Cook University

Training Progressively Binarizing Deep Networks Using FPGAs

Author: Azghadi Mostafa Rahimi
Lammie Corey
Xiang Wei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

While hardware implementations of inference routines for Binarized Neural Networks (BNNs) are plentiful, current realizations of efficient BNN hardware training accelerators, suitable for Internet of Things (IoT) edge devices, leave much to be desired. Conventional BNN hardware training accelerators perform forward and backward propagations with parameters adopting binary representations, and optimization using parameters adopting floating or fixed-point real-valued representations--requiring two distinct sets of network parameters. In this paper, we propose a hardware-friendly training method that, contrary to conventional methods, progressively binarizes a singular set of fixed-point network parameters, yielding notable reductions in power and resource utilizations. We use the Intel FPGA SDK for OpenCL development environment to train our progressively binarizing DNNs on an OpenVINO FPGA. We benchmark our training approach on both GPUs and FPGAs using CIFAR-10 and compare it to conventional BNNs.Comment: Accepted at 2020 IEEE International Symposium on Circuits and Systems (ISCAS

arXiv.org e-Print Archive

Crossref

ResearchOnline at James Cook University

Variation-aware binarized memristive networks

Author: James Alex
Krestinskaya Olga
Lammie Corey
Rahimi Azghadi Mostafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

The quantization of weights to binary states in Deep Neural Networks (DNNs) can replace resource-hungry multiply accumulate operations with simple accumulations. Such Binarized Neural Networks (BNNs) exhibit greatly reduced resource and power requirements. In addition, memristors have been shown as promising synaptic weight elements in DNNs. In this paper, we propose and simulate novel Binarized Memristive Convolutional Neural Network (BMCNN) architectures employing hybrid weight and parameter representations. We train the proposed architectures offline and then map the trained parameters to our binarized memristive devices for inference. To take into account the variations in memristive devices, and to study their effect on the performance, we introduce variations in R ON and R OFF . Moreover, we introduce means to mitigate the adverse effect of memristive variations in our proposed networks. Finally, we benchmark our BMCNNs and variation-aware BMCNNs using the MNIST dataset

arXiv.org e-Print Archive

ResearchOnline at James Cook University

MemTorch: An Open-source Simulation Framework for Memristive Deep Learning Systems

Author: Azghadi Mostafa Rahimi
Lammie Corey
Linares-Barranco Bernabé
Xiang Wei
Publication venue
Publication date: 23/04/2020
Field of study

Memristive devices have shown great promise to facilitate the acceleration and improve the power efficiency of Deep Learning (DL) systems. Crossbar architectures constructed using memristive devices can be used to efficiently implement various in-memory computing operations, such as Multiply-Accumulate (MAC) and unrolled-convolutions, which are used extensively in Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs). Currently, there is a lack of a modernized, open source and general high-level simulation platform that can fully integrate any behavioral or experimental memristive device model and its putative non-idealities into crossbar architectures within DL systems. This paper presents such a framework, entitled MemTorch, which adopts a modernized software engineering methodology and integrates directly with the well-known PyTorch Machine Learning (ML) library. We fully detail the public release of MemTorch and its release management, and use it to perform novel simulations of memristive DL systems, which are trained and benchmarked using the CIFAR-10 dataset. Moreover, we present a case study, in which MemTorch is used to simulate a near-sensor in-memory computing system for seizure detection using Pt/Hf/Ti Resistive Random Access Memory (ReRAM) devices. Our open source MemTorch framework can be used and expanded upon by circuit and system designers to conveniently perform customized large-scale memristive DL simulations taking into account various unavoidable device non-idealities, as a preliminary step before circuit-level realization.Comment: Submitted to IEEE Transactions on Neural Networks and Learning Systems. Update: Fixed accent \'e characte

arXiv.org e-Print Archive

ResearchOnline at James Cook University

Digital.CSIC

Memristive Stochastic Computing for Deep Learning Parameter Optimization

Author: Azghadi Mostafa Rahimi
Eshraghian Jason K.
Lammie Corey
Lu Wei D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Stochastic Computing (SC) is a computing paradigm that allows for the low-cost and low-power computation of various arithmetic operations using stochastic bit streams and digital logic. In contrast to conventional representation schemes used within the binary domain, the sequence of bit streams in the stochastic domain is inconsequential, and computation is usually non-deterministic. In this brief, we exploit the stochasticity during switching of probabilistic Conductive Bridging RAM (CBRAM) devices to efficiently generate stochastic bit streams in order to perform Deep Learning (DL) parameter optimization, reducing the size of Multiply and Accumulate (MAC) units by 5 orders of magnitude. We demonstrate that in using a 40-nm Complementary Metal Oxide Semiconductor (CMOS) process our scalable architecture occupies 1.55mm

^2

and consumes approximately 167

\mu

W when optimizing parameters of a Convolutional Neural Network (CNN) while it is being trained for a character recognition task, observing no notable reduction in accuracy post-training.Comment: Accepted by IEEE Transactions on Circuits and Systems Part II: Express Brief

arXiv.org e-Print Archive

ResearchOnline at James Cook University

Simulation of memristive crossbar arrays for seizure detection and prediction using parallel Convolutional Neural Networks [Formula presented]

Author: Amirsoleimani Amirali
Genov Roman
Lammie Corey
Li Chenqi
Rahimi Azghadi Mostafa
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

For epileptic seizure detection and prediction, to address the computational bottleneck of the von Neumann architecture, we develop an in-memory memristive crossbar-based accelerator simulator. The simulator software is composed of a Python-based neural network training component and a MATLAB-based memristive crossbar array component. The software provides a baseline network for developing deep learning-based signal processing tasks, as well as a platform to investigate the impact of weight mapping schemes and device and peripheral circuitry non-idealities

ResearchOnline at James Cook University

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications

Author: Azghadi Mostafa Rahimi
Donati Elisa
Eshraghian Jason K.
Indiveri Giacomo
Lammie Corey
Linares-Barranco Bernabe
Payvand Melika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can facilitate the advancement of the medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies ranging from emerging memristive devices, to established Field Programmable Gate Arrays (FPGAs), and mature Complementary Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. After providing the required background, we unify the sparsely distributed research on neural network and neuromorphic hardware implementations as applied to the healthcare domain. In addition, we benchmark various hardware platforms by performing a biomedical electromyography (EMG) signal processing task and drawing comparisons among them in terms of inference delay and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that different accelerators and neuromorphic processors introduce to healthcare and biomedical domains. This paper can serve a large audience, ranging from nanoelectronics researchers, to biomedical and healthcare practitioners in grasping the fundamental interplay between hardware, algorithms, and clinical adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems as proponents for driving biomedical circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21 pages, 10 figures, 5 tables

arXiv.org e-Print Archive

Repository for Publications and Research Data

ResearchOnline at James Cook University